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We say that a population is perfectly polarized when divided in two groups of the same size and 
opposite opinions. In this paper, we propose a methodology to study and measure the emergence of 
polarization from social interactions. We begin by proposing a model to estimate opinions in which 
a minority of influential individuals propagate their opinions through a social network. The result 
of the model is an opinion probability density function. Next, we propose an index to quantify the 
extent to which the resulting distribution is polarized. Finally, we apply the proposed methodol¬ 
ogy to a Twitter conversation about the late Venezuelan president, Hugo Chavez, finding a good 
agreement between our results and offline data. Hence, we show that our methodology can detect 
different degrees of polarization, depending on the structure of the network. 


I. INTRODUCTION 

From a sociological point of view, polarization is a 
social phenomenon that appears when individuals align 
their beliefs in extreme and conflicting positions, with 
few individuals holding neutral or moderate opinions 
IH 2\. Thus, as a process it is the increase of such di¬ 
vergence over time when people evaluate issues of diverse 
nature I3H5], like politics or religion. In words of John 
Turner: ’Like polarized molecules, group members be¬ 
come even more aligned in the direction they were al¬ 
ready tending’ [B]. 

In this paper, we propose a methodology to study the 
emergence of political polarization and quantify its ef¬ 
fects. To this end, we introduce a model to estimate opin¬ 
ions, and a polarization index that quantifies to which 
extent the resulting distribution of opinions is polarized. 
We say that a population is perfectly polarized when di¬ 
vided in two groups of the same size and with opposite 
opinions. Hence, our measure of polarization is inspired 
by the electric dipole moment - a measure of the charge 
system’s overall polarity. For two opposed point charges, 
the electric dipole moment increases with the distance be¬ 
tween the charges. Analogously, the polarization of two 
equally populated groups depends on how distant their 
views are. 

As Downs argued in 1957 [7], political discussion 
among individuals minimizes the cost of becoming po¬ 
litically informed. In other words, sensible individuals 
tend to rely in the opinions of experts instead of analyz¬ 
ing information by their own. In fact, several observa¬ 
tional studies support this theory and suggest that the 
expertise distribution within a social network affects the 
political communication patterns^]. Hence, by control¬ 
ling the opinion of a minority of influential individuals 
and mapping the communication fluxes among the pop¬ 
ulation we can estimate their distribution of opinions. To 
this end, we propose a model based on DeGroot model 
The original model proposed by DeGroot describes 
how a group of individuals might reach a shared opin¬ 
ion, by iteratively updating their opinion as the aver¬ 
age of their current opinion with the opinions of their 


neighbors. Such global coordination, without centralized 
control, can also be efficiently achieved when individuals 
adopt the majority state of their neighbors, even in the 
presence of noise or complex topologies m- Recently, 
the DeGroot model has been used to study the conditions 
under which consensus is achieved UMl. However, as 
consensus is rarely reached in real world mm , variants 
of this model can held to a diversity of opinions hehis]- 

In contrast to opinion generation models, such as the 
voter model [2014221 . we do not aim to study the evolu¬ 
tion of opinions, but to infer a distribution of opinions 
formed on a social network from which to measure polar¬ 
ization. In our model, a minority of influential individu¬ 
als propagate their opinions through a directed network 
influencing the remaining individuals. Thus, each indi¬ 
vidual iteratively updates her opinion according to her 
incoming neighbors-those influencing her. Hence, by tak¬ 
ing advantage of complex network analysis |23j . we are 
able to estimate the opinion of the whole majority that 
a priori was unknown. The behavior of the influential 
minority is similar to zealots in the voter model |241 1251 , 
but their impact in the model’s dynamics is different. 
In our model, zealots, rather than preventing consensus, 
allow us to infer the opinions of all the nodes in the net¬ 
work. Contrary to the voter model where opinions are 
binary (0 or 1), the opinions in our model represent a 
continuous distribution. In absence of polarization, the 
expected resulting distribution of opinions would be a 
narrow distribution centered at a neutral opinion. How¬ 
ever, as polarization emerges, the resulting distribution 
shifts to a bimodal distribution with two peaks emerging 
around the two dominant and confronted opinions )2G| . 

How can political polarization be detected and there¬ 
fore be fixed? Nowadays, digital traces of human collec¬ 
tive behavior P8j represent an opportunity to detect and 
measure in real time different phenomena, such as polar¬ 
ization. In fact political segregation has already been ob¬ 
served on political blogs [25j or Twitter GEBED. Recent 
research has shown that the most prominent and polit¬ 
ically active users mainly interact with their own parti¬ 
sans [29H3T] , leaving little space for real debate and cross 
ideological interactions. However, segregation does not 
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necessarily imply polarization, as two separated groups 
of people that share the same opinion can not be consid¬ 
ered as polarized. Hence, in order for a population to be 
polarized, the opinions of the two groups should also be 
conflicting or opposed (52|. In the latter part of this paper 
we show how to apply our methodology to online data 
gathered from Twitter in order to estimate individuals 
opinions and to measure the emergent political polariza¬ 
tion. Twitter provides an interesting context in which to 
study polarization as it represents a wide variety of dif¬ 
ferent types of communications, going from personal to 
those coming from traditional mass media. In this plat¬ 
form, a minority of elite users concentrate much of the 
collective attention, but still a big fraction of the content 
they produce reaches the mass through intermediaries or 
’opinion leaders’ [33] • In other words, the ’two-step-flow’ 
of communication is still valid on Twitter [34]. 

We begin this paper by proposing a model to estimate 
opinions in which a minority of influential individuals 
propagate their opinion through a social network influ¬ 
encing the opinions of the remaining individuals. Thus, 
the result of the model is a probability density function 
p(X), that determines the fraction of individuals holding 
an opinion X. Next, we introduce the polarization in¬ 
dex to measure the political polarization from the result¬ 
ing opinion distribution. To illustrate the power of the 
methodology, we apply it to a Twitter conversation re¬ 
garding the death announcement of the Venezuelan Pres¬ 
ident (Hugo Chavez). Finally, we contrast the results 
with offline data. 


II. ESTIMATING OPINIONS 

We present a model to estimate the opinions of individ¬ 
uals who interact on a social network, in order to obtain 
their opinions distribution. In it we distinguish two types 
of individuals, elite and listeners. The first ones have 
a fixed opinion and act like seeds of influence, while the 
opinion of the second ones depends on their social in¬ 
teractions. The model is fully specified by the following 
assumptions: 

1. Initial Conditions: The world is abstracted by a 
directed network, G, in which each individual is repre¬ 
sented by a node and links account for influence rather 
than friendship or other kind of relationship. We define 
two different subset of nodes, S accounting for elite; and 
L, accounting for listeners. Additionally we endow each 
elite with a parameter, X s , that determines her opinion 
value and that will remain constant for the duration of 
the model. A' s lies in the range, — 1 < X s < 1, where 1 
and -1 represent the two extreme and confronted poles. 
Finally we set an initially neutral opinion, X*(0) = 0 to 
all listeners. 

2. Opinion Generation: At each iteration, elite 
nodes, 5, propagate their opinions through the estab¬ 
lished network, G, influencing listeners , L. Hence, each 
listener iteratively updates her opinion value as the mean 


opinion value of her incoming neighbors. Thus the opin¬ 
ion at time step, t, of a given listener, i, is given by the 
following expression: 


Xi(t) 


'Ej A ijX j (t i) 



(i) 


where Ay represents the elements of the network adja¬ 
cency matrix, which is 1 if and only if there is a link from 
j to i, and k™ corresponds to her indegree. The process 
is repeated until all nodes converge to their respective A, 
value, lying in the range — 1 < X* < 1. Thus, the results 
of the model are given in a density distribution of nodes’ 
opinion values p(X). Note that the opinions of individu¬ 
als do not depend on their opinion in the previous step. 
This is because we are estimating their opinion that a 
priori was unknown, rather than studying the evolution 
of opinions. 

The dynamics of the model is illustrated in Fig. [T[ 
where we present an schema of the influence spreading 
process. Panel A visualizes the instantiation of the model 
where each elite node has been colored according to her 
opinion (red, X s = —1; and blue, X s = +1). Panels B-E 
show the dynamics of the influence process from the ini¬ 
tialization (B) to the final converged state (E). Panels (F) 
and (G) visualize two empirical networks corresponding 
to a non polarized (F) and a polarized (G) case. 


III. INTRODUCING A NEW MEASURE OF 
POLARIZATION IN OPINION DISTRIBUTIONS: 

THE POLARIZATION INDEX 

We say that a population is perfectly polarized when 
divided in two groups of the same size and with oppo¬ 
site opinions. Hence, we propose a measure of polar¬ 
ization that quantifies both effects for the resulting X 
distribution obtained from our model. This definition is 
inspired by the electric dipole moment- a measure of the 
charge system’s overall polarity. In the simplest case of 
two point charges of opposite signs (—q and +q) the elec¬ 
tric dipole moment is proportional to the distance among 
the charges. This is analogous to a simple scenario con¬ 
sisting of two persons with different ideologies, thus the 
polarization depends on how conflicting their points of 
view are ( i.e. the distance among the two ideologies). 

We begin by calculating the population associated with 
each opinion (positive and negative). To this end, we de¬ 
fine A~ as the relative population of the negative opin¬ 
ions (X < 0). By the same token, we define A + as the rel¬ 
ative population of the positive opinions (X > 0). Hence, 
both variables can be expressed as: 

A~ = J p(X)dX = P(X < 0 ) , ( 2 ) 

A+= f p{X)dX = P(X > 0) (3) 

Jo 
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FIG. 1. Schema of the influence spreading process in the opinion estimation model. (A) Displays the seed nodes in the network, 
colored according to their respective ideology. (B) Displays the network at t = 0, before seeds start to propagate their influence. 
(C) Shows the state of the network at t = 1. (D) shows the state of the network at t = n/2. (E) Displays the final state of the 
network at t = n. (F) and (G) Visualizations of two examples of the result of the opinion formation model to the Venezuelan 
dataset for non polarized (F) and polarized (G) days. 
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differing the opinions of the two sides are. To this end we 
determine the gravity center of the positive and negative 
opinions that can be written as: 


J° lP (X)XdX 

f°_ lP (X)dX 


fo P (X)dX 

and define the pole distance, d, as the normalized dis¬ 
tance between the two gravity centers. Hence, it can be 
expressed as: 


fo P (X)XdX 


d = 


I gc + - gc 

Xmax Xmin 


I gc + - gc I 
2 


(7) 


FIG. 2. Schema explaining polarization and the proposed in¬ 
dex (i. (A) Density distribution of opinions, gc stands for the 
gravity center of each pole, A stands for the area associated 
to each ideology, and d stands for the pole distance. (B) Vi¬ 
sualization of the polarization index, (i , given in eq. [8j for 
four situations. 


So we can express the normalized difference in popu¬ 
lation sizes, AH , as: 


This formula gives d = 0 when there is no separation 
between the gravity centers, i.e. there are no longer two 
differentiated groups and everyone shares a similar opin¬ 
ion; and d = 1 when the two opinions are extreme and 
perfectly opposed. 

Finally, we can use eqs. [4] and [7] to write down a gen¬ 
eral formula to measure polarization as a function of the 
difference in size between both populations AH and the 
poles distance d. Thus, we define the P olarization index , 
/r, as: 

H = (1 — A A)d (8) 


AH = \A + — A~\ = \P(X > 0) — P(A' < 0)| (4) 

Next, we quantify the distance between the positive 
and negative opinions. In other words we measure how 


This formula gives /i = 1 when the distribution is per¬ 
fectly polarized. In this case the opinion distribution 
function is two Dirac delta centered at —1 and +1 re¬ 
spectively. Conversely, (i = 0 means that the opinions 
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FIG. 3. Projection in a two-dimensional space of the distribu¬ 
tion of elite users according to the similarity of their content. 
Dots represent users and colors indicate the community they 
belong to in the elite network: red for the officialism and blue 
for the opposition. The distance between users is inversely 
proportional to the similarity of their content. 


are not polarized at all, and the resulting distribution of 
opinions would either take the form of a single Dirac delta 
centered at a neutral opinion, or be entirely centered in 
one of the poles, implying that the population (A) of the 
other pole would be reduced to zero and A A = 1. Notice 
that for non-uniform distributions centered in a neutral 
opinion, |/i| <C 1, but still presents a minimum polariza¬ 
tion due to a small separation between gravity centers, 
that depends on the standard deviation a. In the case of 
a Gaussian distribution centered at zero, p = 

In between, polarization can lie within the range, 
0 < n < 1, for three reasons: i) The population sizes 
associated to each opinion are equal, but the pole dis¬ 
tance d is lower than 1. ii) Despite d being equal to 1, 
the population sizes associated to each opinion are dif¬ 
ferent and therefore there is a majority sharing a similar 
opinion, iii) A combination of i and ii. Fig. [2j4 illustrates 
the basic concepts of the proposed index of polarization, 
as it visualizes the area associated to each opinion, their 
corresponding gravity centers and the pole distance for a 
standard case of a perfect bimodal distribution. In panel 
B of this figure, we have visualized non polarized distri¬ 
butions (fi = 0 and \fj,\ C 1), a perfectly polarized one 
(n = 1) and a case in between. 


IV. TWITTER DATA: THE VENEZUELAN 
CASE 

In this section, we apply our model and polarization 
index to Twitter data regarding the late Venezuelan Pres¬ 
ident Hugo Chavez. We downloaded over 16,383,490 
messages written by 3,173,090 users from 02/04/2013 to 
05/04/2013. This period covers one month preceding his 
death, the announcement of the death, and the sched¬ 
ule for new elections. We use retweets as a proxy for 
influence [3oMTl , and build a weighted and directed net¬ 
work accounting for the adoption of ideas among Twitter 
users for each day. Whenever a user i retweets a message 
originally posted by user j, we assume that i is being in¬ 
fluenced by j’s ideas. Hence, a new directed link (j —» i) 
is created. We constructed an individual retweet network 
for each day of the observation period, which is a total 
of 56 networks. More details about the dataset and the 
retweet networks can be found in the Appendix A and B 
respectively. 

In order to apply the model to these daily networks, we 
begin by defining a set of elite users. We denote as elite 
those users who gained a noticeable amount of retweets 
and actively participated in the conversation along the 
observation period. The distribution of users according 
to the total amount of retweets obtained ( S ou t ) and par¬ 
ticipation rate ( p ) is shown in Fig. [9] of Appendix B. In 
this case, we considered a very small set (0.02%) of in¬ 
fluential users who participated most of the observation 
period (p > 89%) and obtained a very high number of 
retransmissions ( S out > 1000). 

The elite users mainly correspond to politicians, jour¬ 
nalists and mass media accounts, whose political position 
and editorial tendency are publicly known and who be¬ 
long to both sides of the Venezuelan political spectrum. 
In order to assign them an ideology value, X s , we first 
studied their network of interactions. In the elite net¬ 
work, nodes represent the elite users, and links are cre¬ 
ated and accumulated whenever an elite user i retweets 
an elite user j. This network is polarized in a well defined 
two-community structure, with modularity Q = 0.38. In 
each community, users share political ideology and hardly 
interact with users from the other pole. In fact, the as- 
sortative mixing [42] by political ideology is very high 
(r = 0.88). 

In order to further understand the elite polarization, 
we analyzed the content of their messages. For this pur¬ 
pose, we abstracted each elite user as a high-dimensional 
vector, where each element represents the number of 
times that the user posted each of the 500 mostly used 
words from all the elite's messages. Then, we reduced 
the high-dimensional space into a two-dimensional one, 
by applying a multi-dimensional scaling algorithm 43] , 
In this algorithm, users are mapped into a new space by 
preserving the distance between them in the original one. 
This means that the distance between users is inversely 
proportional to the similarity of their posted contents. In 
Fig- 0 we present the projection of the users in the new 
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FIG. 4. Time evolution of ideological value (A';) probability density functions (p(A)) for the Venezuelan conversation. Labels 
indicate the day of observation, D standing for the day of the Presidents death. Colors indicate the number of participants. 


two-dimensional space. Dots represent users and colors 
are assigned according to the community they belong to 
in the elite network. It can be noticed that these users 
are not homogeneously distributed in the new space. In¬ 
stead, they are separated from each other in agreement 
with our previous classification. This means that the use 
of language is polarized among the elite users. 

After identifying the elite users, we assigned them ide¬ 
ology values of X s = —1 to the officialism side and of 
X s = 1 to the opposition. The remaining users (99.98%) 
were assigned the role of listeners and Xi = 0. After 
running the model we obtained an ideology probability 
density function p(X) for each day. The resulting p(X) 
for each network are presented in Fig. [4j The label indi¬ 
cates the day of observation, D representing the day of 
the death. The color indicates the network size in terms 
of the number of participants. As can be seen the days 
with largest participation (purple and blue) correspond 
to the most important announcements: the presidents 
death (day D), and call for election (day D + 6). Next, 
we calculated the polarization index (/r), pole distance 
( d ) and populations sizes for the resulting distributions 
of each day and plotted the results in Fig. [5] 

We identify day D as a turning point which ended up 
polarizing even more the conversation. During the days 
preceding the announcement (from D — 29 to D — 1), X 
presents a bimodal distribution in which the officialism 
population (negative side of the X distribution) is con¬ 
siderably smaller than the opposition (positive side of the 
X distribution). This means that during this period the 
conversation was still polarized, but practically monop¬ 
olized by the opposition. Hence, despite the fact that 
the pole distance reached values over 0.9, the polariza¬ 
tion index just averaged under 0.4. Then a shift in the 
conversation emergent patterns took place on the day 
of the President’s death announcement (day D). Dur¬ 
ing this day X lost its bimodal distribution, and the re¬ 
sulting p(X) was centered around neutral values, min¬ 


imizing the pole distance. All these meaning that the 
conversation was not so polarized and that the network 
does not have a two-island structure anymore. There¬ 
fore, the polarization index decreased, p « 0.25. This 
behavior is due to the bursty growth of the conversation 
at day D (see Fig. [7] in the Appendix B). As a con¬ 
sequence, the previously segregated modules combined 
into a single-island structure, many times larger than the 
usual network size. Besides a large amount of users from 
all around the globe joined to the conversation, making 
the topic international, rather than local from Venezuela. 
In fact, during this day the percentage of users tweeting 
from Venezuela (« 20%) was very low in comparison to 
the rest of the days (average around > 80%). Hence, 
our set of Venezuelan elite were not capable of polar¬ 
izing this majority of worldwide users. However, from 
there on the conversation recovered its bimodal distribu¬ 
tion of opinions. Moreover, the polarization reached its 
maximum from day D +12 (marked with the dashed line) 
onwards, day that the officialism new leader entered the 
conversation. From this day onwards X presents a bi¬ 
modal distribution, where the populations of both sides 
are similar. Therefore, the polarization index averaged 
values around 0.9. 


V. TWITTER SHOWS THE TWO SIDES OF 
VENEZUELA 

Next we evaluate our model and the validity of Twitter 
data by comparing the geographic distribution of the po¬ 
larized users with offline data regarding the Venezuelan 
socioeconomic and political landscape. More specifically, 
we analyze the geographical density of geolocated tweets 
in Caracas, the capital city of Venezuela, taking the re¬ 
sults obtained from the most polarized days in section 
|IV| as a proxy of their ideology. For this purpose, we 
have built the density functions that a tweet associated 
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FIG. 5. Time evolution of the polarization index p, (C), and 
the variables associated with it: difference in population sizes 
(A) and pole distance d (B) for the Venezuelan conversation. 


with the officialism or the opposition had been posted 
by a geolocated user at a given position (longitude and 
latitude). We considered a grid of 100 cells between lon¬ 
gitudes [-67.12°, -66.71°] and latitudes [10.31°, 10.57°] 
and counted the number of tweets in each cell, identified 
with each ideology. Then, we normalized both counts 
by their respective total number of tweets. The result¬ 
ing functions are two surfaces on top of the map, which 
we show in Fig. [6] as contour plots (red for the offi¬ 
cialism and blue for the opposition) that indicate lines 
of equal value in the 2-D probability density function. 
These contour lines are superimposed on a map of the 
municipalities composing the city of Caracas. There are 
five of them, bordered in green. The labels correspond to 
the municipality name, and the color indicates the ruling 
party-like the officialism in Libertador and the opposition 
in Chacao, Sucre, Baruta and El Hatillo. Additionally, 
urbanized areas are colored in yellow and poorer regions 
(slums) in pink. Notice that the West region is char¬ 
acterized for having lower income and governed by the 
officialism, while the East part is wealthier and governed 
by the opposition. 

It can be noticed that the regions where each pole con- 



FIG. 6. Geographical polarization in the city of Caracas. 
Contour lines represent the density functions of the probabil¬ 
ity that a tweet associated with the officialism (red) or the 
opposition (blue) had been posted by a geolocated user at a 
given position (latitude and longitude). These contours have 
been superimposed to the map of Caracas, Venezuela. From 
inside out, contours indicate the following values: [0.175, 0.15, 
0.0125 0.10, 0.075, 0.05]. The green lines border the five mu¬ 
nicipalities composing the city. Labels indicate the name of 
the municipality and the color indicate the ruling party ac¬ 
cording to the 2013 Venezuelan local elections (red for the offi¬ 
cialism party and blue for the opposition parties). White rep¬ 
resents unpopulated areas, yellow urbanized areas and pink 
the poorer neighborhoods. 


centrates most of their tweets are well separated from 
each other, showing that the city presents a clear geo¬ 
graphical polarization. In fact, there is a good corre¬ 
spondence between the results of our model and offline 
evidence, such as electoral results or socioeconomic fac¬ 
tors. Those municipalities governed by the opposition 
contain the highest concentration of users identified with 
this pole, and the same effect occurs for the officialism 
side of the political spectrum. We also have to remark 
that the areas with higher concentration of users aligned 
with the officialism, correspond to the parts of the city 
with the largest concentration of poorer neighborhoods 
(pink areas). Conversely, the opposition users concen¬ 
trate in urban developed regions. All these suggesting 
that the basis of the Venezuelan popular polarization re¬ 
sides in socioeconomic factors and that the political con¬ 
flict in Venezuela presents a strong territorial facet. 

VI. CONCLUSIONS 

Modern democracies have to represent the conflicts 
existing in our society, while at the same time main¬ 
tain the social stability m- However, as polarization 
emerges, the few most powerful parties tend to capitalize 
the whole of the public attention and support, silencing 
the moderate opinions and under representing minorities. 
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Consequently, todays’ society is concerned about polar¬ 
ization, as a politically polarized society implies several 
risks. These risks include the appearance of radicalism 
or civil wars. In fact, one of the actual challenges and 
a cutting edge topic is how to detect the emergence of 
political polarization and how to fix it. 

We state that the possibility to gather user generated 
data from social media platforms [28] > together with net¬ 
work science [43], represents an opportunity to detect 
political polarization. In this work, we have proposed 
a methodology to study and measure the emergence of 
polarization from social interactions. We have used it, 
to analyze the political polarization in one of the most 
polarized countries: Venezuela [46j I47| . We have done 
this, by applying our methods to a Twitter conversation 
about the late Venezuelan president Hugo Chavez. We 
have shown that our methodology is able to detect differ¬ 
ent degrees of polarization in the conversation, depend¬ 
ing on the participants’ behavior, given by the structure 
of the network. Finally, we have contrasted our results 
against offline data, such as municipality governments or 
socioeconomic factors, finding a good correlation between 
the online and offline polarization. Hence, we conclude 
that online data seem to be a good proxy to detect polit¬ 
ically polarized societies, as the online polarization that 
we found is a reflection of the Venezuelan political, ter¬ 
ritorial and social polarization. 

Another relevant question is: Can social media plat¬ 
forms help reduce political polarization as more voices 
could be heard? Although we do not answer this ques¬ 
tion, our results show that a minority of elite users were 
able to influence the whole online social network, result¬ 
ing in a highly politically polarized conversation. How¬ 
ever, these Venezuelan local influential accounts were not 
capable of polarizing the network when the conversation 
stopped being local of Venezuela and turned to be inter¬ 
national. This opens two questions that can be studied 
from a social media analysis perspective: i) How does 
online political polarization change at different scales-like 
city, country, continent or whole world? ii) How could we 
target interventions in control strategies on social media 
that might be implemented to reduce polarization? 
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APPENDIX A: DATASETS 

In this work, we analyze messages from the online so¬ 
cial network Twitter. We downloaded data from a tem¬ 
poral index of tweets managed by the Search API vl [48] , 
whose limitations are specified as the result of queries 
complexity and frequency, instead of fixed a percentage 


of the main stream. We queried for messages mentioning 
the name of the late Venezuelan President Hugo Chavez, 
during the events that surrounded his disease and death 
in 2013. We considered a two month period from Febru¬ 
ary 4th, 2013 (29 days before the death announcement) 
to April 4th, 2013 (26 days after the death announce¬ 
ment). In summary, we downloaded 16,383,490 messages 
posted by 3,173,090 users from more than 159 countries 
(according to the 0.4% of geographically located mes¬ 
sages). Our analyses are based on those messages that 
represent reweets (49% of the downloaded content) and 
more specifically those that constitute the larger compo¬ 
nents of the communication networks, which were posted 
by 57% of original set of users. 

The Venezuelan Internet penetration represents about 
40% of the population, where most of users belong to 
middle and middle-low class [J5] . Online social networks 
are very popular in this country. Around 33% of Venezue¬ 
lans use Facebook [49] and almost 10% use Twitter Ell¬ 
in fact, Venezuela ranks thirteenth out of all countries in 
number of Twitter users E2- Moreover, Venezuela has 
the highest proportion of mobile Internet in Latin Amer¬ 
ica at over 30% of total connections, due to the popular 
use of social media from mobile phones EDI- 

The political usage of Twitter in Venezuela is of great 
importance and has played a fundamental role in the re¬ 
cent Venezuelan history [52ll53| . The late President Hugo 
Chavez was considered to be the second most influential 
world leader on Twitter [64], preceded only by the US 
President Barack Obama. The collective who opposes 
the late President, also finds on social media a channel 
to freely speak to their supporters and protest against 
the Government 02- 


appendix B: NETWORKS 

We have built one retweet network for each day of the 
observation period (56 networks). A retweet network 
emerges from user-to-user interactions during the mes¬ 
sage retransmission process provided by Twitter. Nodes 
represent users and links are created between users i and 
j, when i forwards the content previously posted by j. 
Edges are weighted in proportion to the frequency that 
i retweeted f s messages, and directed in the sense of 
the flow of information from the message source j to the 
retweeter i. 

A single network contains several retransmission cas¬ 
cades, seeded and propagated by the conversation par¬ 
ticipants. When these cascades are aggregated, several 
disconnected network components emerge. Among these 
components, there is a single one called Giant Compo¬ 
nent (GC) whose size is in the same order of the whole 
network. As part of the GC, there is a set of nodes that 
are reachable from the set of influential elite, that repre¬ 
sent about 50% of the GC’s size (Fig. m For most of 
days, the amount of reachable nodes fluctuated around 
10,000 users and explosively grew to almost 500,000 users 



FIG. 7. Time evolution of the relative number of reachable 
nodes in comparison to the GC (A) and size of the reachable 
nodes’ networks (B). 


during day D (Fig. |Tj3) . This behavior is typical of break¬ 
ing news and critical events |44l |5'5] , with a bursty in¬ 
crease during the main occurrence and a slow decay that 
may last for several days. 

The retweet networks characterize the way that the 
collective attention is organized during an event on Twit¬ 
ter. The out strength (s ou t) indicates the amount of 
retweets gained by a participant, while the in strength 
(sin) indicates the number of retweets made by the par¬ 
ticipant. In Fig. [8] we have superimposed the out 
strength (top) and in strength (bottom) complementary 
cumulative density functions (CCDF) for each of the con¬ 
structed networks, in log-log (left) and linear-log (right) 
scales. In both cases, the distributions display hetero¬ 
geneous behavior, being the out strength distributions 
broader than the in strength distributions. In order to 
compare, whether these distributions behave like an ex¬ 
ponential rather than a power law, we calculated the like¬ 
lihood ratio statistical test [561157]. We found that the 
probability of these distributions to follow an exponential 
curve, instead of a power law, has a p-value < 0.01 for 
more than 98% of the outgoing distributions and 75% of 
the incoming distributions, where over 87% of the distri¬ 
butions have a p-value < 0.05. 

From a dynamical point of view, the power law distri¬ 
butions imply a preferential attachment mechanism [45| , 
where the chances of being retweeted increases with the 
number of retweets previously gained. These dynamics 
result in heterogeneous distributions where the great ma¬ 
jority of users receive a very small amount of the collec¬ 
tive attention, while some scarce users receive a dispro- 
portionally larger amount of it. For example, at all days 



FIG. 8. Complementary cumulative density function (CCDF) 
of the retweet networks out strength s 0 ut (top) and in 
strength Si„ (bottom), from the Twitter conversation about 
the Venezuelan President Hugo Chavez, in log-log (left) and 
linear-log (right) scale. The colors indicate the corresponding 
day of the observation period. The dotted line indicates the 
range of s ou t for 50% of the population, while the dashed lines 
indicate the range of s ou t for 1% of the population. 


50% of the population gained between 2 or 3 retweets at 
most (dotted lines in the top left panel of Fig. [8|, while 
the 1% of most retweeted participants gained from 130 
to 430 retweets as minimum (dashed lines in the top left 
panel of Fig. [8]). 

To further understand the relationship between the in¬ 
dividual activity and the attention received, we will ag¬ 
gregate the observation period by characterizing the indi¬ 
viduals according to their rate of participation and total 
amount of retweets gained. The participation rate is de¬ 
fined as: 


P = Pi/T (9) 

where pi is the number of days that the user i actively 
participated in the retweet process and T is the total 
length of the observation period. The total number of 
retweets gained by user is measured as: 


T 

$out E Soutit ) (10) 

t=0 

where s ou t{t) is the out strength of the node i at day 
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Participation rate p 


FIG. 9. Joint probability density function of the accumulated 
out strength ( Sout ) and the participation rate (p), from the 
Twitter conversation about the Venezuelan President Hugo 
Chavez. Colors correspond to the density of users. The grey 
reactangle marked at the top right corner indicates the elite 
users defined in section HVl 


t. If the user did not actively participate at day t , then 

Sout (^) = 0- 


The joint probability density function of the accumu¬ 
lated out strength S ou t. and the participation rate p, 
P(S out , p), is shown in Fig. El This distribution indicates 
the total amount of attention received by users accord¬ 
ing to their participation rate. It can be noticed that 
the largest density of users (red and orange dots in Fig. 
[9]) participated less than 20% (p < 0.2) of the days and 
present a small out strength value ( S ou t < 10), which 
means that most of them received a little amount of the 
collective attention. However, there is a very small set 
of users at the upper right corner in Fig. [9j who partic¬ 
ipated almost every day and present an extremely high 
Sout- This minority of highly influential users captured 
most of the collective attention throughout the obser¬ 
vation period, and define the elite users considered in 
section nsi 
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